Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 500 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 39.2 KiB |
| Average record size in memory | 80.3 B |
Variable types
| NUM | 7 |
|---|---|
| CAT | 3 |
car name has a high cardinality: 77 distinct values | High cardinality |
id has unique values | Unique |
mpg has unique values | Unique |
acceleration has unique values | Unique |
Reproduction
| Analysis started | 2020-11-01 15:29:24.417281 |
|---|---|
| Analysis finished | 2020-11-01 15:29:47.396618 |
| Duration | 22.98 seconds |
| Software version | pandas-profiling v2.9.0 |
| Download configuration | config.yaml |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 500.176 |
|---|---|
| Minimum | 0 |
| Maximum | 997 |
| Zeros | 1 |
| Zeros (%) | 0.2% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 56.95 |
| Q1 | 242.25 |
| median | 513 |
| Q3 | 750.25 |
| 95-th percentile | 943.2 |
| Maximum | 997 |
| Range | 997 |
| Interquartile range (IQR) | 508 |
Descriptive statistics
| Standard deviation | 288.6571789 |
|---|---|
| Coefficient of variation (CV) | 0.5771112147 |
| Kurtosis | -1.223066581 |
| Mean | 500.176 |
| Median Absolute Deviation (MAD) | 250.5 |
| Skewness | -0.03182781094 |
| Sum | 250088 |
| Variance | 83322.96696 |
| Monotocity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 997 | 1 | 0.2% | |
| 343 | 1 | 0.2% | |
| 321 | 1 | 0.2% | |
| 322 | 1 | 0.2% | |
| 323 | 1 | 0.2% | |
| 324 | 1 | 0.2% | |
| 326 | 1 | 0.2% | |
| 327 | 1 | 0.2% | |
| 328 | 1 | 0.2% | |
| 329 | 1 | 0.2% | |
| Other values (490) | 490 | 98.0% |
| Value | Count | Frequency (%) | |
| 0 | 1 | 0.2% | |
| 3 | 1 | 0.2% | |
| 4 | 1 | 0.2% | |
| 7 | 1 | 0.2% | |
| 9 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 997 | 1 | 0.2% | |
| 995 | 1 | 0.2% | |
| 994 | 1 | 0.2% | |
| 983 | 1 | 0.2% | |
| 981 | 1 | 0.2% |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 27.01093994 |
|---|---|
| Minimum | 15.78761298 |
| Maximum | 44.7638971 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 15.78761298 |
|---|---|
| 5-th percentile | 17.20086215 |
| Q1 | 22.39664058 |
| median | 26.2289843 |
| Q3 | 35.08833319 |
| 95-th percentile | 36.40464199 |
| Maximum | 44.7638971 |
| Range | 28.97628412 |
| Interquartile range (IQR) | 12.69169261 |
Descriptive statistics
| Standard deviation | 7.356248557 |
|---|---|
| Coefficient of variation (CV) | 0.2723433014 |
| Kurtosis | -0.8799047176 |
| Mean | 27.01093994 |
| Median Absolute Deviation (MAD) | 7.479963315 |
| Skewness | 0.3626768372 |
| Sum | 13505.46997 |
| Variance | 54.11439284 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 23.2943741 | 1 | 0.2% | |
| 35.94487152 | 1 | 0.2% | |
| 17.61331358 | 1 | 0.2% | |
| 28.69833784 | 1 | 0.2% | |
| 34.66772443 | 1 | 0.2% | |
| 23.16728215 | 1 | 0.2% | |
| 26.71916695 | 1 | 0.2% | |
| 35.6918206 | 1 | 0.2% | |
| 22.8104364 | 1 | 0.2% | |
| 28.38035121 | 1 | 0.2% | |
| Other values (490) | 490 | 98.0% |
| Value | Count | Frequency (%) | |
| 15.78761298 | 1 | 0.2% | |
| 15.99133885 | 1 | 0.2% | |
| 16.27513541 | 1 | 0.2% | |
| 16.30387949 | 1 | 0.2% | |
| 16.40398691 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 44.7638971 | 1 | 0.2% | |
| 44.7184269 | 1 | 0.2% | |
| 44.68008394 | 1 | 0.2% | |
| 44.53396354 | 1 | 0.2% | |
| 44.45964949 | 1 | 0.2% |
cylinders
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.9 KiB |
| 4 | |
|---|---|
| 8 | |
| 6 |
| Value | Count | Frequency (%) | |
| 4 | 305 | 61.0% | |
| 8 | 103 | 20.6% | |
| 6 | 92 | 18.4% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
displacement
Real number (ℝ≥0)
| Distinct | 43 |
|---|---|
| Distinct (%) | 8.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 194.762 |
|---|---|
| Minimum | 79 |
| Maximum | 429 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 79 |
|---|---|
| 5-th percentile | 89 |
| Q1 | 104 |
| median | 140 |
| Q3 | 302 |
| 95-th percentile | 400 |
| Maximum | 429 |
| Range | 350 |
| Interquartile range (IQR) | 198 |
Descriptive statistics
| Standard deviation | 106.2774253 |
|---|---|
| Coefficient of variation (CV) | 0.5456784452 |
| Kurtosis | -0.7672025598 |
| Mean | 194.762 |
| Median Absolute Deviation (MAD) | 43 |
| Skewness | 0.807167207 |
| Sum | 97381 |
| Variance | 11294.89114 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=43)
| Value | Count | Frequency (%) | |
| 140 | 65 | 13.0% | |
| 97 | 41 | 8.2% | |
| 302 | 38 | 7.6% | |
| 400 | 30 | 6.0% | |
| 318 | 27 | 5.4% | |
| 90 | 22 | 4.4% | |
| 350 | 20 | 4.0% | |
| 104 | 20 | 4.0% | |
| 200 | 19 | 3.8% | |
| 151 | 16 | 3.2% | |
| Other values (33) | 202 | 40.4% |
| Value | Count | Frequency (%) | |
| 79 | 1 | 0.2% | |
| 80 | 4 | 0.8% | |
| 85 | 10 | 2.0% | |
| 88 | 2 | 0.4% | |
| 89 | 11 | 2.2% |
| Value | Count | Frequency (%) | |
| 429 | 15 | 3.0% | |
| 400 | 30 | 6.0% | |
| 360 | 2 | 0.4% | |
| 351 | 2 | 0.4% | |
| 350 | 20 | 4.0% |
horsepower
Real number (ℝ≥0)
| Distinct | 40 |
|---|---|
| Distinct (%) | 8.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 106.8452138 |
|---|---|
| Minimum | 54 |
| Maximum | 220 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 54 |
|---|---|
| 5-th percentile | 61 |
| Q1 | 85 |
| median | 100 |
| Q3 | 130 |
| 95-th percentile | 150 |
| Maximum | 220 |
| Range | 166 |
| Interquartile range (IQR) | 45 |
Descriptive statistics
| Standard deviation | 35.27743567 |
|---|---|
| Coefficient of variation (CV) | 0.330173289 |
| Kurtosis | 1.216583084 |
| Mean | 106.8452138 |
| Median Absolute Deviation (MAD) | 23.5 |
| Skewness | 1.042661291 |
| Sum | 53422.60692 |
| Variance | 1244.497467 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=40)
| Value | Count | Frequency (%) | |
| 85 | 47 | 9.4% | |
| 150 | 44 | 8.8% | |
| 97 | 41 | 8.2% | |
| 110 | 40 | 8.0% | |
| 67 | 39 | 7.8% | |
| 100 | 29 | 5.8% | |
| 90 | 27 | 5.4% | |
| 148 | 21 | 4.2% | |
| 60 | 20 | 4.0% | |
| 71 | 17 | 3.4% | |
| Other values (30) | 175 | 35.0% |
| Value | Count | Frequency (%) | |
| 54 | 3 | 0.6% | |
| 58 | 1 | 0.2% | |
| 60 | 20 | 4.0% | |
| 61 | 2 | 0.4% | |
| 64 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 220 | 14 | 2.8% | |
| 193 | 6 | 1.2% | |
| 165 | 2 | 0.4% | |
| 150 | 44 | 8.8% | |
| 148 | 21 | 4.2% |
weight
Real number (ℝ≥0)
| Distinct | 79 |
|---|---|
| Distinct (%) | 15.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2719.714 |
|---|---|
| Minimum | 1755 |
| Maximum | 4732 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 1755 |
|---|---|
| 5-th percentile | 1875 |
| Q1 | 2178.75 |
| median | 2615 |
| Q3 | 3193 |
| 95-th percentile | 4275.05 |
| Maximum | 4732 |
| Range | 2977 |
| Interquartile range (IQR) | 1014.25 |
Descriptive statistics
| Standard deviation | 717.0354104 |
|---|---|
| Coefficient of variation (CV) | 0.2636436811 |
| Kurtosis | 0.08925493619 |
| Mean | 2719.714 |
| Median Absolute Deviation (MAD) | 491 |
| Skewness | 0.9191024558 |
| Sum | 1359857 |
| Variance | 514139.7798 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 3193 | 35 | 7.0% | |
| 2000 | 31 | 6.2% | |
| 2300 | 30 | 6.0% | |
| 3233 | 24 | 4.8% | |
| 2774 | 22 | 4.4% | |
| 1875 | 22 | 4.4% | |
| 2200 | 20 | 4.0% | |
| 2815 | 16 | 3.2% | |
| 2123 | 15 | 3.0% | |
| 2245 | 14 | 2.8% | |
| Other values (69) | 271 | 54.2% |
| Value | Count | Frequency (%) | |
| 1755 | 2 | 0.4% | |
| 1760 | 12 | 2.4% | |
| 1875 | 22 | 4.4% | |
| 1925 | 1 | 0.2% | |
| 1955 | 3 | 0.6% |
| Value | Count | Frequency (%) | |
| 4732 | 3 | 0.6% | |
| 4638 | 1 | 0.2% | |
| 4464 | 1 | 0.2% | |
| 4456 | 13 | 2.6% | |
| 4376 | 1 | 0.2% |
| Distinct | 500 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15.3003277 |
|---|---|
| Minimum | 9.530858797 |
| Maximum | 21.92251057 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 9.530858797 |
|---|---|
| 5-th percentile | 13.04329295 |
| Q1 | 13.44156163 |
| median | 15.23192308 |
| Q3 | 17.19053114 |
| 95-th percentile | 19.34364274 |
| Maximum | 21.92251057 |
| Range | 12.39165177 |
| Interquartile range (IQR) | 3.74896951 |
Descriptive statistics
| Standard deviation | 2.261096048 |
|---|---|
| Coefficient of variation (CV) | 0.1477808902 |
| Kurtosis | 0.516151278 |
| Mean | 15.3003277 |
| Median Absolute Deviation (MAD) | 1.85832857 |
| Skewness | 0.2892277264 |
| Sum | 7650.163849 |
| Variance | 5.112555338 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) | |
| 14.87498839 | 1 | 0.2% | |
| 15.05676058 | 1 | 0.2% | |
| 13.38957282 | 1 | 0.2% | |
| 13.74991059 | 1 | 0.2% | |
| 13.04427743 | 1 | 0.2% | |
| 15.31706211 | 1 | 0.2% | |
| 15.41902455 | 1 | 0.2% | |
| 14.99155532 | 1 | 0.2% | |
| 17.85633313 | 1 | 0.2% | |
| 9.719183764 | 1 | 0.2% | |
| Other values (490) | 490 | 98.0% |
| Value | Count | Frequency (%) | |
| 9.530858797 | 1 | 0.2% | |
| 9.559641329 | 1 | 0.2% | |
| 9.57878946 | 1 | 0.2% | |
| 9.590617368 | 1 | 0.2% | |
| 9.621400371 | 1 | 0.2% |
| Value | Count | Frequency (%) | |
| 21.92251057 | 1 | 0.2% | |
| 21.88568819 | 1 | 0.2% | |
| 21.75093686 | 1 | 0.2% | |
| 21.67368634 | 1 | 0.2% | |
| 21.60600638 | 1 | 0.2% |
model year
Real number (ℝ≥0)
| Distinct | 13 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 76.332 |
|---|---|
| Minimum | 70 |
| Maximum | 82 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 3.9 KiB |
Quantile statistics
| Minimum | 70 |
|---|---|
| 5-th percentile | 70 |
| Q1 | 73 |
| median | 76 |
| Q3 | 80 |
| 95-th percentile | 82 |
| Maximum | 82 |
| Range | 12 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 3.909007121 |
|---|---|
| Coefficient of variation (CV) | 0.05121059479 |
| Kurtosis | -1.335199548 |
| Mean | 76.332 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.1839449657 |
| Sum | 38166 |
| Variance | 15.28033667 |
| Monotocity | Not monotonic |
Histogram with fixed size bins (bins=13)
| Value | Count | Frequency (%) | |
| 81 | 63 | 12.6% | |
| 71 | 62 | 12.4% | |
| 80 | 52 | 10.4% | |
| 76 | 48 | 9.6% | |
| 79 | 43 | 8.6% | |
| 70 | 35 | 7.0% | |
| 78 | 34 | 6.8% | |
| 82 | 33 | 6.6% | |
| 73 | 32 | 6.4% | |
| 75 | 29 | 5.8% | |
| Other values (3) | 69 | 13.8% |
| Value | Count | Frequency (%) | |
| 70 | 35 | 7.0% | |
| 71 | 62 | 12.4% | |
| 72 | 19 | 3.8% | |
| 73 | 32 | 6.4% | |
| 74 | 27 | 5.4% |
| Value | Count | Frequency (%) | |
| 82 | 33 | 6.6% | |
| 81 | 63 | 12.6% | |
| 80 | 52 | 10.4% | |
| 79 | 43 | 8.6% | |
| 78 | 34 | 6.8% |
origin
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.9 KiB |
| 1 | |
|---|---|
| 3 | |
| 2 |
| Value | Count | Frequency (%) | |
| 1 | 373 | 74.6% | |
| 3 | 83 | 16.6% | |
| 2 | 44 | 8.8% |
Frequencies of value counts
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Histogram of lengths of the category
Length
| Max length | 1 |
|---|---|
| Median length | 1 |
| Mean length | 1 |
| Min length | 1 |
| Distinct | 77 |
|---|---|
| Distinct (%) | 15.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 3.9 KiB |
| dodge monaco brougham | 27 |
|---|---|
| datsun 200sx | 24 |
| chevrolet nova | 21 |
| vw rabbit | 18 |
| pontiac astro | 17 |
| Other values (72) |
| Value | Count | Frequency (%) | |
| dodge monaco brougham | 27 | 5.4% | |
| datsun 200sx | 24 | 4.8% | |
| chevrolet nova | 21 | 4.2% | |
| vw rabbit | 18 | 3.6% | |
| pontiac astro | 17 | 3.4% | |
| ford pinto | 17 | 3.4% | |
| ford futura | 17 | 3.4% | |
| honda civic 1300 | 16 | 3.2% | |
| dodge rampage | 15 | 3.0% | |
| dodge aspen | 15 | 3.0% | |
| Other values (67) | 313 | 62.6% |
Frequencies of value counts
Unique
| Unique | 22 ? |
|---|---|
| Unique (%) | 4.4% |
Histogram of lengths of the category
Length
| Max length | 33 |
|---|---|
| Median length | 14 |
| Mean length | 14.77 |
| Min length | 8 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| id | mpg | cylinders | displacement | horsepower | weight | acceleration | model year | origin | car name | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 23.059782 | 6 | 140 | 110.0 | 2815 | 17.977429 | 80 | 1 | dodge aspen |
| 1 | 3 | 17.674521 | 8 | 350 | 150.0 | 4456 | 13.514535 | 72 | 1 | dodge rampage |
| 2 | 4 | 17.136353 | 8 | 302 | 140.0 | 2774 | 13.209912 | 79 | 1 | mercury cougar brougham |
| 3 | 7 | 22.664666 | 6 | 400 | 85.0 | 2190 | 15.196381 | 71 | 1 | pontiac j2000 se hatchback |
| 4 | 9 | 17.872018 | 8 | 429 | 220.0 | 2245 | 9.621400 | 70 | 1 | ford galaxie 500 |
| 5 | 11 | 23.405007 | 6 | 140 | 110.0 | 2815 | 18.152362 | 80 | 1 | dodge aspen |
| 6 | 13 | 17.250298 | 6 | 318 | 110.0 | 3205 | 19.228868 | 75 | 1 | vw rabbit custom |
| 7 | 16 | 35.469676 | 4 | 140 | 165.0 | 2145 | 13.519583 | 82 | 1 | amc gremlin |
| 8 | 19 | 22.839820 | 6 | 200 | 85.0 | 3193 | 17.215803 | 71 | 1 | dodge monaco brougham |
| 9 | 23 | 36.489563 | 4 | 104 | 60.0 | 2000 | 14.899884 | 81 | 1 | datsun 200sx |
Last rows
| id | mpg | cylinders | displacement | horsepower | weight | acceleration | model year | origin | car name | |
|---|---|---|---|---|---|---|---|---|---|---|
| 490 | 974 | 22.490094 | 6 | 200 | 85.0 | 3193 | 17.210477 | 73 | 1 | dodge monaco brougham |
| 491 | 976 | 36.222958 | 4 | 98 | 67.0 | 2000 | 14.991555 | 79 | 1 | fiat 124 sport coupe |
| 492 | 977 | 22.224490 | 6 | 400 | 85.0 | 2711 | 17.113384 | 78 | 1 | honda civic 1300 |
| 493 | 978 | 17.344275 | 8 | 400 | 193.0 | 4732 | 12.956417 | 70 | 1 | hi 1200d |
| 494 | 980 | 22.739561 | 4 | 318 | 139.0 | 2525 | 13.294111 | 78 | 1 | ford futura |
| 495 | 981 | 22.798447 | 4 | 140 | 148.0 | 2835 | 13.477573 | 82 | 1 | datsun 200-sx |
| 496 | 983 | 35.173640 | 4 | 97 | 67.0 | 2234 | 17.542681 | 80 | 3 | plymouth valiant |
| 497 | 994 | 17.825448 | 8 | 302 | 220.0 | 2774 | 15.177189 | 76 | 1 | triumph tr7 coupe |
| 498 | 995 | 28.545147 | 4 | 97 | 150.0 | 2130 | 13.324669 | 70 | 1 | datsun pl510 |
| 499 | 997 | 36.011880 | 4 | 97 | 150.0 | 2300 | 15.364361 | 71 | 1 | chevrolet nova |